Overview

Brought to you by YData

Dataset statistics

Number of variables18
Number of observations15000
Missing cells2890
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.9 MiB
Average record size in memory551.5 B

Variable types

Text2
Numeric11
Boolean1
Categorical4

Alerts

POSSIBLENterm has constant value "True" Constant
Insidesource has constant value "TMHMM2.0" Constant
TMhelixsource has constant value "TMHMM2.0" Constant
Outsidesource has constant value "TMHMM2.0" Constant
ExpnumberofAAsinTMHs is highly overall correlated with Insideend and 5 other fieldsHigh correlation
Insideend is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fieldsHigh correlation
Insidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fieldsHigh correlation
Length is highly overall correlated with Insideend and 1 other fieldsHigh correlation
Outsideend is highly overall correlated with Length and 3 other fieldsHigh correlation
Outsidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fieldsHigh correlation
PredictedTMHsNumber is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fieldsHigh correlation
TMhelixend is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fieldsHigh correlation
TMhelixstart is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fieldsHigh correlation
POSSIBLENterm has 2890 (19.3%) missing values Missing
Protein_ID has unique values Unique
Expnumberfirst60AAs has 765 (5.1%) zeros Zeros

Reproduction

Analysis started2025-07-15 11:52:32.405867
Analysis finished2025-07-15 11:52:44.979108
Duration12.57 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

Distinct14674
Distinct (%)97.8%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
2025-07-15T13:52:45.046960image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length87
Median length86
Mean length23.835467
Min length6

Characters and Unicode

Total characters357532
Distinct characters65
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14361 ?
Unique (%)95.7%

Sample

1st rowMGV-GENOME-0377366
2nd rowMGV-GENOME-0228589
3rd rowTemPhD_cluster_54944
4th rowTemPhD_cluster_21940
5th rowuvig_280215
ValueCountFrequency (%)
mgv-genome-0340415 3
 
< 0.1%
temphd_cluster_8028 3
 
< 0.1%
mycobacterium_phage_gadjet 3
 
< 0.1%
otu_72 3
 
< 0.1%
uvig_15691 3
 
< 0.1%
uvig_200799 3
 
< 0.1%
uvig_588980 3
 
< 0.1%
nc_042116.1 3
 
< 0.1%
station168_sur_all_assembly_node_176_length_220308_cov_73.630021 3
 
< 0.1%
temphd_cluster_9569 3
 
< 0.1%
Other values (14664) 14970
99.8%
2025-07-15T13:52:45.241618image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 34456
 
9.6%
1 18585
 
5.2%
0 16049
 
4.5%
3 15294
 
4.3%
2 14762
 
4.1%
E 12304
 
3.4%
5 12134
 
3.4%
4 12091
 
3.4%
M 11196
 
3.1%
7 10981
 
3.1%
Other values (55) 199680
55.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 357532
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
_ 34456
 
9.6%
1 18585
 
5.2%
0 16049
 
4.5%
3 15294
 
4.3%
2 14762
 
4.1%
E 12304
 
3.4%
5 12134
 
3.4%
4 12091
 
3.4%
M 11196
 
3.1%
7 10981
 
3.1%
Other values (55) 199680
55.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 357532
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
_ 34456
 
9.6%
1 18585
 
5.2%
0 16049
 
4.5%
3 15294
 
4.3%
2 14762
 
4.1%
E 12304
 
3.4%
5 12134
 
3.4%
4 12091
 
3.4%
M 11196
 
3.1%
7 10981
 
3.1%
Other values (55) 199680
55.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 357532
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
_ 34456
 
9.6%
1 18585
 
5.2%
0 16049
 
4.5%
3 15294
 
4.3%
2 14762
 
4.1%
E 12304
 
3.4%
5 12134
 
3.4%
4 12091
 
3.4%
M 11196
 
3.1%
7 10981
 
3.1%
Other values (55) 199680
55.8%

Protein_ID
Text

Unique 

Distinct15000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
2025-07-15T13:52:45.360065image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length90
Median length88
Mean length26.651867
Min length9

Characters and Unicode

Total characters399778
Distinct characters65
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15000 ?
Unique (%)100.0%

Sample

1st rowMGV-GENOME-0377366_94
2nd rowMGV-GENOME-0228589_3
3rd rowTemPhD_cluster_54944_50
4th rowTemPhD_cluster_21940_29
5th rowuvig_280215_16
ValueCountFrequency (%)
station137_mes_combined_final_node_8849_length_10087_cov_2.107157_9 1
 
< 0.1%
station155_dcm_all_assembly_node_4760_length_17116_cov_5.143016_14 1
 
< 0.1%
mgv-genome-0377366_94 1
 
< 0.1%
mgv-genome-0228589_3 1
 
< 0.1%
temphd_cluster_54944_50 1
 
< 0.1%
temphd_cluster_21940_29 1
 
< 0.1%
uvig_280215_16 1
 
< 0.1%
temphd_cluster_2820_6 1
 
< 0.1%
uvig_396803_67 1
 
< 0.1%
mgv-genome-0085121_16 1
 
< 0.1%
Other values (14990) 14990
99.9%
2025-07-15T13:52:45.578235image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 49101
 
12.3%
1 23704
 
5.9%
2 18767
 
4.7%
3 18764
 
4.7%
0 17832
 
4.5%
4 14966
 
3.7%
5 14590
 
3.6%
6 13121
 
3.3%
7 12953
 
3.2%
8 12495
 
3.1%
Other values (55) 203485
50.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 399778
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
_ 49101
 
12.3%
1 23704
 
5.9%
2 18767
 
4.7%
3 18764
 
4.7%
0 17832
 
4.5%
4 14966
 
3.7%
5 14590
 
3.6%
6 13121
 
3.3%
7 12953
 
3.2%
8 12495
 
3.1%
Other values (55) 203485
50.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 399778
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
_ 49101
 
12.3%
1 23704
 
5.9%
2 18767
 
4.7%
3 18764
 
4.7%
0 17832
 
4.5%
4 14966
 
3.7%
5 14590
 
3.6%
6 13121
 
3.3%
7 12953
 
3.2%
8 12495
 
3.1%
Other values (55) 203485
50.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 399778
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
_ 49101
 
12.3%
1 23704
 
5.9%
2 18767
 
4.7%
3 18764
 
4.7%
0 17832
 
4.5%
4 14966
 
3.7%
5 14590
 
3.6%
6 13121
 
3.3%
7 12953
 
3.2%
8 12495
 
3.1%
Other values (55) 203485
50.9%

Length
Real number (ℝ)

High correlation 

Distinct1150
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean216.97947
Minimum23
Maximum5055
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:45.663791image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum23
5-th percentile47
Q181
median128
Q3213
95-th percentile781
Maximum5055
Range5032
Interquartile range (IQR)132

Descriptive statistics

Standard deviation283.11148
Coefficient of variation (CV)1.3047847
Kurtosis30.426127
Mean216.97947
Median Absolute Deviation (MAD)57
Skewness4.3630647
Sum3254692
Variance80152.111
MonotonicityNot monotonic
2025-07-15T13:52:45.940916image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
71 120
 
0.8%
66 114
 
0.8%
107 114
 
0.8%
68 111
 
0.7%
60 110
 
0.7%
128 109
 
0.7%
78 103
 
0.7%
91 102
 
0.7%
69 101
 
0.7%
88 101
 
0.7%
Other values (1140) 13915
92.8%
ValueCountFrequency (%)
23 1
 
< 0.1%
25 1
 
< 0.1%
26 1
 
< 0.1%
27 1
 
< 0.1%
28 2
 
< 0.1%
29 33
0.2%
30 29
0.2%
31 37
0.2%
32 41
0.3%
33 33
0.2%
ValueCountFrequency (%)
5055 1
< 0.1%
4711 1
< 0.1%
3789 1
< 0.1%
3582 1
< 0.1%
3366 1
< 0.1%
3283 1
< 0.1%
3027 1
< 0.1%
2861 1
< 0.1%
2857 1
< 0.1%
2705 1
< 0.1%

PredictedTMHsNumber
Real number (ℝ)

High correlation 

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.8842
Minimum1
Maximum26
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:46.013212image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile5
Maximum26
Range25
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.846858
Coefficient of variation (CV)0.9801815
Kurtosis27.806686
Mean1.8842
Median Absolute Deviation (MAD)0
Skewness4.4740398
Sum28263
Variance3.4108844
MonotonicityNot monotonic
2025-07-15T13:52:46.085399image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1 8806
58.7%
2 3750
25.0%
3 1081
 
7.2%
4 561
 
3.7%
5 211
 
1.4%
6 180
 
1.2%
10 78
 
0.5%
7 66
 
0.4%
8 64
 
0.4%
11 45
 
0.3%
Other values (14) 158
 
1.1%
ValueCountFrequency (%)
1 8806
58.7%
2 3750
25.0%
3 1081
 
7.2%
4 561
 
3.7%
5 211
 
1.4%
6 180
 
1.2%
7 66
 
0.4%
8 64
 
0.4%
9 42
 
0.3%
10 78
 
0.5%
ValueCountFrequency (%)
26 1
 
< 0.1%
24 1
 
< 0.1%
22 1
 
< 0.1%
21 1
 
< 0.1%
20 8
0.1%
19 1
 
< 0.1%
18 11
0.1%
17 2
 
< 0.1%
16 12
0.1%
15 8
0.1%

ExpnumberofAAsinTMHs
Real number (ℝ)

High correlation 

Distinct13589
Distinct (%)90.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.601879
Minimum9.06624
Maximum558.81059
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:46.158187image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum9.06624
5-th percentile17.401948
Q120.867945
median23.1689
Q344.302967
95-th percentile110.20577
Maximum558.81059
Range549.74435
Interquartile range (IQR)23.435022

Descriptive statistics

Standard deviation42.842182
Coefficient of variation (CV)1.0298136
Kurtosis27.936477
Mean41.601879
Median Absolute Deviation (MAD)5.542635
Skewness4.4509783
Sum624028.19
Variance1835.4525
MonotonicityNot monotonic
2025-07-15T13:52:46.238127image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24.87583 19
 
0.1%
18.23661 17
 
0.1%
47.86547 16
 
0.1%
210.43458 16
 
0.1%
108.33627 14
 
0.1%
39.48045 13
 
0.1%
36.04048 13
 
0.1%
108.28257 11
 
0.1%
21.42924 11
 
0.1%
46.4987 10
 
0.1%
Other values (13579) 14860
99.1%
ValueCountFrequency (%)
9.06624 1
< 0.1%
9.72277 1
< 0.1%
10.14356 1
< 0.1%
10.23364 1
< 0.1%
10.68052 1
< 0.1%
11.23047 2
< 0.1%
11.41795 1
< 0.1%
11.44372 1
< 0.1%
11.49807 1
< 0.1%
11.49934 1
< 0.1%
ValueCountFrequency (%)
558.81059 1
< 0.1%
527.93568 1
< 0.1%
520.15842 1
< 0.1%
511.60373 1
< 0.1%
494.58563 1
< 0.1%
491.00928 1
< 0.1%
473.96137 1
< 0.1%
473.24722 1
< 0.1%
470.79895 1
< 0.1%
470.76217 1
< 0.1%

Expnumberfirst60AAs
Real number (ℝ)

Zeros 

Distinct12383
Distinct (%)82.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.585516
Minimum0
Maximum49.17952
Zeros765
Zeros (%)5.1%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:46.328239image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q116.499458
median21.23728
Q325.40934
95-th percentile41.447687
Maximum49.17952
Range49.17952
Interquartile range (IQR)8.9098825

Descriptive statistics

Standard deviation12.224024
Coefficient of variation (CV)0.59381673
Kurtosis-0.46734641
Mean20.585516
Median Absolute Deviation (MAD)4.53789
Skewness-0.11228669
Sum308782.75
Variance149.42677
MonotonicityNot monotonic
2025-07-15T13:52:46.419662image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 765
 
5.1%
0.00018 33
 
0.2%
42.15085 23
 
0.2%
24.87583 19
 
0.1%
18.23661 17
 
0.1%
0.00055 16
 
0.1%
25.0819 16
 
0.1%
8 × 10-515
 
0.1%
38.11529 14
 
0.1%
1 × 10-514
 
0.1%
Other values (12373) 14068
93.8%
ValueCountFrequency (%)
0 765
5.1%
1 × 10-514
 
0.1%
2 × 10-58
 
0.1%
3 × 10-511
 
0.1%
4 × 10-55
 
< 0.1%
5 × 10-53
 
< 0.1%
6 × 10-510
 
0.1%
7 × 10-54
 
< 0.1%
8 × 10-515
 
0.1%
9 × 10-58
 
0.1%
ValueCountFrequency (%)
49.17952 1
< 0.1%
47.80002 1
< 0.1%
47.7514 1
< 0.1%
47.03513 1
< 0.1%
46.73738 1
< 0.1%
46.71216 1
< 0.1%
46.14895 1
< 0.1%
46.10417 1
< 0.1%
46.02934 1
< 0.1%
45.99137 1
< 0.1%

TotalprobofNin
Real number (ℝ)

Distinct11967
Distinct (%)79.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.59128324
Minimum4 × 10-5
Maximum1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:46.507350image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum4 × 10-5
5-th percentile0.0225355
Q10.2392375
median0.69571
Q30.92887
95-th percentile0.9962505
Maximum1
Range0.99996
Interquartile range (IQR)0.6896325

Descriptive statistics

Standard deviation0.35351111
Coefficient of variation (CV)0.59787102
Kurtosis-1.4030931
Mean0.59128324
Median Absolute Deviation (MAD)0.277735
Skewness-0.38427659
Sum8869.2486
Variance0.12497011
MonotonicityNot monotonic
2025-07-15T13:52:46.593113image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.99854 25
 
0.2%
0.56701 19
 
0.1%
0.59265 17
 
0.1%
0.28216 16
 
0.1%
0.03881 15
 
0.1%
0.97286 14
 
0.1%
0.67222 13
 
0.1%
0.86194 13
 
0.1%
0.61003 11
 
0.1%
0.99474 11
 
0.1%
Other values (11957) 14846
99.0%
ValueCountFrequency (%)
4 × 10-53
< 0.1%
5 × 10-52
< 0.1%
7 × 10-51
 
< 0.1%
0.00011 1
 
< 0.1%
0.00013 2
< 0.1%
0.00014 1
 
< 0.1%
0.00018 1
 
< 0.1%
0.00019 1
 
< 0.1%
0.0002 1
 
< 0.1%
0.00021 2
< 0.1%
ValueCountFrequency (%)
1 3
 
< 0.1%
0.99998 7
< 0.1%
0.99997 3
 
< 0.1%
0.99996 7
< 0.1%
0.99995 5
< 0.1%
0.99994 8
0.1%
0.99993 5
< 0.1%
0.99992 5
< 0.1%
0.99991 4
< 0.1%
0.9999 4
< 0.1%

POSSIBLENterm
Boolean

Constant  Missing 

Distinct1
Distinct (%)< 0.1%
Missing2890
Missing (%)19.3%
Memory size633.2 KiB
True
12110 
(Missing)
2890 
ValueCountFrequency (%)
True 12110
80.7%
(Missing) 2890
 
19.3%
2025-07-15T13:52:46.658431image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Insidesource
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.0 MiB
TMHMM2.0
15000 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters120000
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTMHMM2.0
2nd rowTMHMM2.0
3rd rowTMHMM2.0
4th rowTMHMM2.0
5th rowTMHMM2.0

Common Values

ValueCountFrequency (%)
TMHMM2.0 15000
100.0%

Length

2025-07-15T13:52:46.718864image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T13:52:46.777237image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
tmhmm2.0 15000
100.0%

Most occurring characters

ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 120000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 120000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 120000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Insidestart
Real number (ℝ)

High correlation 

Distinct817
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean90.841933
Minimum1
Maximum3783
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:46.854110image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median33
Q385
95-th percentile461.05
Maximum3783
Range3782
Interquartile range (IQR)84

Descriptive statistics

Standard deviation184.49293
Coefficient of variation (CV)2.0309226
Kurtosis44.750878
Mean90.841933
Median Absolute Deviation (MAD)32
Skewness5.1643474
Sum1362629
Variance34037.642
MonotonicityNot monotonic
2025-07-15T13:52:46.946419image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 4962
33.1%
27 511
 
3.4%
28 493
 
3.3%
33 419
 
2.8%
24 325
 
2.2%
38 323
 
2.2%
22 244
 
1.6%
43 183
 
1.2%
25 182
 
1.2%
62 146
 
1.0%
Other values (807) 7212
48.1%
ValueCountFrequency (%)
1 4962
33.1%
19 6
 
< 0.1%
20 7
 
< 0.1%
21 2
 
< 0.1%
22 244
 
1.6%
23 145
 
1.0%
24 325
 
2.2%
25 182
 
1.2%
26 51
 
0.3%
27 511
 
3.4%
ValueCountFrequency (%)
3783 1
< 0.1%
3223 1
< 0.1%
2852 1
< 0.1%
2762 1
< 0.1%
2629 1
< 0.1%
2201 1
< 0.1%
2124 1
< 0.1%
2073 1
< 0.1%
2037 1
< 0.1%
2013 1
< 0.1%

Insideend
Real number (ℝ)

High correlation 

Distinct889
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean136.77793
Minimum1
Maximum3789
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:47.258379image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q136
median86
Q3150
95-th percentile510.05
Maximum3789
Range3788
Interquartile range (IQR)114

Descriptive statistics

Standard deviation197.99709
Coefficient of variation (CV)1.4475807
Kurtosis38.293389
Mean136.77793
Median Absolute Deviation (MAD)56
Skewness4.6987253
Sum2051669
Variance39202.849
MonotonicityNot monotonic
2025-07-15T13:52:47.343796image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6 1147
 
7.6%
12 438
 
2.9%
4 398
 
2.7%
11 279
 
1.9%
20 244
 
1.6%
19 153
 
1.0%
8 142
 
0.9%
1 117
 
0.8%
60 115
 
0.8%
67 111
 
0.7%
Other values (879) 11856
79.0%
ValueCountFrequency (%)
1 117
 
0.8%
2 19
 
0.1%
4 398
 
2.7%
6 1147
7.6%
8 142
 
0.9%
10 6
 
< 0.1%
11 279
 
1.9%
12 438
 
2.9%
15 27
 
0.2%
16 55
 
0.4%
ValueCountFrequency (%)
3789 1
< 0.1%
3582 1
< 0.1%
2861 1
< 0.1%
2857 1
< 0.1%
2652 1
< 0.1%
2410 1
< 0.1%
2373 1
< 0.1%
2334 1
< 0.1%
2209 1
< 0.1%
2026 1
< 0.1%

TMhelixsource
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.0 MiB
TMHMM2.0
15000 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters120000
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTMHMM2.0
2nd rowTMHMM2.0
3rd rowTMHMM2.0
4th rowTMHMM2.0
5th rowTMHMM2.0

Common Values

ValueCountFrequency (%)
TMHMM2.0 15000
100.0%

Length

2025-07-15T13:52:47.419867image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T13:52:47.477292image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
tmhmm2.0 15000
100.0%

Most occurring characters

ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 120000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 120000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 120000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

TMhelixstart
Real number (ℝ)

High correlation 

Distinct825
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean94.883667
Minimum2
Maximum3763
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:47.542659image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4
Q110
median37
Q390
95-th percentile468
Maximum3763
Range3761
Interquartile range (IQR)80

Descriptive statistics

Standard deviation186.11565
Coefficient of variation (CV)1.9615141
Kurtosis43.503572
Mean94.883667
Median Absolute Deviation (MAD)30
Skewness5.1199736
Sum1423255
Variance34639.034
MonotonicityNot monotonic
2025-07-15T13:52:47.628430image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7 1153
 
7.7%
5 1011
 
6.7%
4 976
 
6.5%
10 534
 
3.6%
13 472
 
3.1%
15 345
 
2.3%
20 301
 
2.0%
12 282
 
1.9%
21 244
 
1.6%
39 188
 
1.3%
Other values (815) 9494
63.3%
ValueCountFrequency (%)
2 117
 
0.8%
3 19
 
0.1%
4 976
6.5%
5 1011
6.7%
6 82
 
0.5%
7 1153
7.7%
9 142
 
0.9%
10 534
3.6%
11 22
 
0.1%
12 282
 
1.9%
ValueCountFrequency (%)
3763 1
< 0.1%
3200 1
< 0.1%
2829 1
< 0.1%
2739 1
< 0.1%
2609 1
< 0.1%
2335 1
< 0.1%
2178 1
< 0.1%
2101 1
< 0.1%
2027 1
< 0.1%
2017 1
< 0.1%

TMhelixend
Real number (ℝ)

High correlation 

Distinct842
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean115.74207
Minimum18
Maximum3782
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:47.714653image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile24
Q131
median58
Q3111
95-th percentile488.05
Maximum3782
Range3764
Interquartile range (IQR)80

Descriptive statistics

Standard deviation186.34886
Coefficient of variation (CV)1.6100357
Kurtosis43.331369
Mean115.74207
Median Absolute Deviation (MAD)31
Skewness5.1104464
Sum1736131
Variance34725.896
MonotonicityNot monotonic
2025-07-15T13:52:47.799593image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29 840
 
5.6%
26 694
 
4.6%
27 690
 
4.6%
24 472
 
3.1%
32 423
 
2.8%
35 350
 
2.3%
23 271
 
1.8%
37 257
 
1.7%
34 254
 
1.7%
42 251
 
1.7%
Other values (832) 10498
70.0%
ValueCountFrequency (%)
18 3
 
< 0.1%
19 23
 
0.2%
20 10
 
0.1%
21 209
 
1.4%
22 183
 
1.2%
23 271
 
1.8%
24 472
3.1%
25 99
 
0.7%
26 694
4.6%
27 690
4.6%
ValueCountFrequency (%)
3782 1
< 0.1%
3222 1
< 0.1%
2851 1
< 0.1%
2761 1
< 0.1%
2628 1
< 0.1%
2357 1
< 0.1%
2200 1
< 0.1%
2123 1
< 0.1%
2049 1
< 0.1%
2036 1
< 0.1%

Outsidesource
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.0 MiB
TMHMM2.0
15000 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters120000
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTMHMM2.0
2nd rowTMHMM2.0
3rd rowTMHMM2.0
4th rowTMHMM2.0
5th rowTMHMM2.0

Common Values

ValueCountFrequency (%)
TMHMM2.0 15000
100.0%

Length

2025-07-15T13:52:47.878448image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T13:52:47.937488image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
tmhmm2.0 15000
100.0%

Most occurring characters

ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 120000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 120000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 120000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 45000
37.5%
T 15000
 
12.5%
H 15000
 
12.5%
2 15000
 
12.5%
. 15000
 
12.5%
0 15000
 
12.5%

Outsidestart
Real number (ℝ)

High correlation 

Distinct766
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88.063533
Minimum1
Maximum2720
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:48.007697image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median36
Q385
95-th percentile410.05
Maximum2720
Range2719
Interquartile range (IQR)84

Descriptive statistics

Standard deviation168.82232
Coefficient of variation (CV)1.9170514
Kurtosis29.391507
Mean88.063533
Median Absolute Deviation (MAD)35
Skewness4.5598674
Sum1320953
Variance28500.976
MonotonicityNot monotonic
2025-07-15T13:52:48.092052image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 3844
25.6%
30 1038
 
6.9%
36 531
 
3.5%
25 530
 
3.5%
27 398
 
2.7%
28 389
 
2.6%
35 322
 
2.1%
44 319
 
2.1%
32 261
 
1.7%
43 200
 
1.3%
Other values (756) 7168
47.8%
ValueCountFrequency (%)
1 3844
25.6%
17 3
 
< 0.1%
18 2
 
< 0.1%
19 5
 
< 0.1%
20 70
 
0.5%
21 18
 
0.1%
22 52
 
0.3%
23 141
 
0.9%
24 33
 
0.2%
25 530
 
3.5%
ValueCountFrequency (%)
2720 1
< 0.1%
2358 1
< 0.1%
2087 1
< 0.1%
2050 1
< 0.1%
2036 1
< 0.1%
2032 1
< 0.1%
1953 1
< 0.1%
1883 1
< 0.1%
1667 1
< 0.1%
1591 1
< 0.1%

Outsideend
Real number (ℝ)

High correlation 

Distinct1126
Distinct (%)7.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean174.0852
Minimum3
Maximum5055
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.4 KiB
2025-07-15T13:52:48.175173image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile3
Q133
median80
Q3181.25
95-th percentile728
Maximum5055
Range5052
Interquartile range (IQR)148.25

Descriptive statistics

Standard deviation287.17438
Coefficient of variation (CV)1.6496197
Kurtosis29.473674
Mean174.0852
Median Absolute Deviation (MAD)61
Skewness4.286865
Sum2611278
Variance82469.126
MonotonicityNot monotonic
2025-07-15T13:52:48.258019image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3 976
 
6.5%
4 613
 
4.1%
9 534
 
3.6%
14 345
 
2.3%
38 169
 
1.1%
19 148
 
1.0%
33 132
 
0.9%
32 126
 
0.8%
39 116
 
0.8%
28 112
 
0.7%
Other values (1116) 11729
78.2%
ValueCountFrequency (%)
3 976
6.5%
4 613
4.1%
5 82
 
0.5%
6 6
 
< 0.1%
9 534
3.6%
10 16
 
0.1%
11 3
 
< 0.1%
12 34
 
0.2%
14 345
 
2.3%
16 4
 
< 0.1%
ValueCountFrequency (%)
5055 1
< 0.1%
4711 1
< 0.1%
3762 1
< 0.1%
3366 1
< 0.1%
3283 1
< 0.1%
3199 1
< 0.1%
3027 1
< 0.1%
2828 1
< 0.1%
2738 1
< 0.1%
2705 1
< 0.1%

Phage_source
Categorical

Distinct13
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1008.1 KiB
MGV
4368 
GPD
3948 
TemPhD
2350 
GOV2
2062 
CHVD
1056 
Other values (8)
1216 

Length

Max length8
Median length3
Mean length3.8216667
Min length3

Characters and Unicode

Total characters57325
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMGV
2nd rowMGV
3rd rowTemPhD
4th rowTemPhD
5th rowGPD

Common Values

ValueCountFrequency (%)
MGV 4368
29.1%
GPD 3948
26.3%
TemPhD 2350
15.7%
GOV2 2062
13.7%
CHVD 1056
 
7.0%
GVD 443
 
3.0%
RefSeq 229
 
1.5%
IGVD 176
 
1.2%
PhagesDB 170
 
1.1%
Genbank 106
 
0.7%
Other values (3) 92
 
0.6%

Length

2025-07-15T13:52:48.340206image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mgv 4368
29.1%
gpd 3948
26.3%
temphd 2350
15.7%
gov2 2062
13.7%
chvd 1056
 
7.0%
gvd 443
 
3.0%
refseq 229
 
1.5%
igvd 176
 
1.2%
phagesdb 170
 
1.1%
genbank 106
 
0.7%
Other values (3) 92
 
0.6%

Most occurring characters

ValueCountFrequency (%)
G 11103
19.4%
V 8177
14.3%
D 8165
14.2%
P 6468
11.3%
M 4377
 
7.6%
e 3084
 
5.4%
h 2520
 
4.4%
T 2422
 
4.2%
m 2350
 
4.1%
O 2062
 
3.6%
Other values (18) 6597
11.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 57325
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
G 11103
19.4%
V 8177
14.3%
D 8165
14.2%
P 6468
11.3%
M 4377
 
7.6%
e 3084
 
5.4%
h 2520
 
4.4%
T 2422
 
4.2%
m 2350
 
4.1%
O 2062
 
3.6%
Other values (18) 6597
11.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 57325
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
G 11103
19.4%
V 8177
14.3%
D 8165
14.2%
P 6468
11.3%
M 4377
 
7.6%
e 3084
 
5.4%
h 2520
 
4.4%
T 2422
 
4.2%
m 2350
 
4.1%
O 2062
 
3.6%
Other values (18) 6597
11.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 57325
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
G 11103
19.4%
V 8177
14.3%
D 8165
14.2%
P 6468
11.3%
M 4377
 
7.6%
e 3084
 
5.4%
h 2520
 
4.4%
T 2422
 
4.2%
m 2350
 
4.1%
O 2062
 
3.6%
Other values (18) 6597
11.5%

Interactions

2025-07-15T13:52:43.911967image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:33.723504image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.571615image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.594775image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.434206image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.230067image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.314088image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.167105image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.972466image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.018692image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.858377image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:43.978121image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:33.800426image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.641387image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.672487image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.504305image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.297244image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.381807image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.235279image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:40.258971image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.113842image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.937757image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:44.047254image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:33.880284image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.711894image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.736350image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.575292image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.362911image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.452404image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.304173image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:40.327482image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.209713image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:42.000209image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:44.123247image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:33.967199image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.978853image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.808428image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.648789image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.681174image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.528500image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.375669image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:40.395568image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.281333image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:42.064685image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:44.191388image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.040665image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.049469image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.880899image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.723247image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.760464image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.620660image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.444116image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:40.499498image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.356635image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:42.127921image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:44.259049image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.110082image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.139080image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.972641image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.802908image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.833923image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.695356image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.526757image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:40.570308image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.429087image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:42.195532image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:44.329705image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.196533image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.217250image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.040745image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.877866image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.908751image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.771439image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.627424image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:40.647185image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.499266image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:42.263075image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:44.397977image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.271501image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.291591image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.126886image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.948670image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.996805image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.843834image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.701671image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:40.728736image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.568216image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:42.333193image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:44.465262image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.357277image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.364799image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.200581image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.020836image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.067832image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.919511image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.773473image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:40.804683image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.643863image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:42.404359image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:44.535671image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.428957image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.433396image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.288396image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.090892image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.177347image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.997043image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.843181image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:40.879585image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.714499image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:43.560348image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:44.607249image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:34.501119image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:35.520886image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:36.364030image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:37.157050image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:38.247740image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.099079image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:39.910003image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:40.949297image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:41.787658image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-15T13:52:43.623350image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-07-15T13:52:48.394495image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Expnumberfirst60AAsExpnumberofAAsinTMHsInsideendInsidestartLengthOutsideendOutsidestartPhage_sourcePredictedTMHsNumberTMhelixendTMhelixstartTotalprobofNin
Expnumberfirst60AAs1.0000.407-0.1650.136-0.358-0.334-0.1300.0480.356-0.145-0.1640.157
ExpnumberofAAsinTMHs0.4071.0000.5130.7320.2560.3100.5810.0540.8740.6910.6710.090
Insideend-0.1650.5131.0000.7780.5120.1080.2940.0420.5320.6690.662-0.250
Insidestart0.1360.7320.7781.0000.2720.0980.2510.0400.7840.6580.652-0.180
Length-0.3580.2560.5120.2721.0000.7370.4410.0360.2520.4760.4800.001
Outsideend-0.3340.3100.1080.0980.7371.0000.7310.0370.2960.5960.6080.246
Outsidestart-0.1300.5810.2940.2510.4410.7311.0000.0520.6130.7660.7680.281
Phage_source0.0480.0540.0420.0400.0360.0370.0521.0000.0480.0410.0410.029
PredictedTMHsNumber0.3560.8740.5320.7840.2520.2960.6130.0481.0000.6770.6790.101
TMhelixend-0.1450.6910.6690.6580.4760.5960.7660.0410.6771.0000.9940.093
TMhelixstart-0.1640.6710.6620.6520.4800.6080.7680.0410.6790.9941.0000.102
TotalprobofNin0.1570.090-0.250-0.1800.0010.2460.2810.0290.1010.0930.1021.000

Missing values

2025-07-15T13:52:44.721826image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-15T13:52:44.888190image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Phage_IDProtein_IDLengthPredictedTMHsNumberExpnumberofAAsinTMHsExpnumberfirst60AAsTotalprobofNinPOSSIBLENtermInsidesourceInsidestartInsideendTMhelixsourceTMhelixstartTMhelixendOutsidesourceOutsidestartOutsideendPhage_source
1392473MGV-GENOME-0377366MGV-GENOME-0377366_94107239.8931439.893140.99747TrueTMHMM2.053.0107.0TMHMM2.033.052.0TMHMM2.030.032.0MGV
1225655MGV-GENOME-0228589MGV-GENOME-0228589_3204245.0347242.872730.99861TrueTMHMM2.062.0204.0TMHMM2.039.061.0TMHMM2.030.038.0MGV
2065853TemPhD_cluster_54944TemPhD_cluster_54944_50108122.1582121.194420.14212TrueTMHMM2.043.0108.0TMHMM2.024.042.0TMHMM2.01.023.0TemPhD
1828787TemPhD_cluster_21940TemPhD_cluster_21940_2941122.6078622.607860.35682TrueTMHMM2.038.041.0TMHMM2.015.037.0TMHMM2.01.014.0TemPhD
575773uvig_280215uvig_280215_16571122.922610.000000.91546NaNTMHMM2.01.0169.0TMHMM2.0170.0192.0TMHMM2.0193.0571.0GPD
1869556TemPhD_cluster_2820TemPhD_cluster_2820_666119.7698419.769560.91180TrueTMHMM2.01.06.0TMHMM2.07.026.0TMHMM2.027.066.0TemPhD
717310uvig_396803uvig_396803_67183118.6530218.454940.74857TrueTMHMM2.01.06.0TMHMM2.07.025.0TMHMM2.026.0183.0GPD
1193825MGV-GENOME-0085121MGV-GENOME-0085121_1698242.3129740.907270.98844TrueTMHMM2.062.098.0TMHMM2.044.061.0TMHMM2.035.043.0MGV
1524904MGV-GENOME-0378116MGV-GENOME-0378116_30122241.7841223.944020.91384TrueTMHMM2.081.0122.0TMHMM2.058.080.0TMHMM2.044.057.0MGV
1051224MGV-GENOME-0357329MGV-GENOME-0357329_12169364.4194727.065770.30090TrueTMHMM2.0119.0169.0TMHMM2.096.0118.0TMHMM2.093.095.0MGV
Phage_IDProtein_IDLengthPredictedTMHsNumberExpnumberofAAsinTMHsExpnumberfirst60AAsTotalprobofNinPOSSIBLENtermInsidesourceInsidestartInsideendTMhelixsourceTMhelixstartTMhelixendOutsidesourceOutsidestartOutsideendPhage_source
1441688MGV-GENOME-0273252MGV-GENOME-0273252_2979240.9196840.917470.93822TrueTMHMM2.054.079.0TMHMM2.034.053.0TMHMM2.025.033.0MGV
2103306TemPhD_cluster_57437TemPhD_cluster_57437_5698362.4981536.023550.99237TrueTMHMM2.067.072.0TMHMM2.073.095.0TMHMM2.096.098.0TemPhD
1091401MGV-GENOME-0271360MGV-GENOME-0271360_27538370.8060715.071140.22167TrueTMHMM2.0521.0538.0TMHMM2.0503.0520.0TMHMM2.0489.0502.0MGV
984303MGV-GENOME-0355129MGV-GENOME-0355129_34151248.300614.820980.87113NaNTMHMM2.0151.0151.0TMHMM2.0131.0150.0TMHMM2.096.0130.0MGV
1079974MGV-GENOME-4416057MGV-GENOME-4416057_42148237.6375919.390250.07251TrueTMHMM2.025.0119.0TMHMM2.0120.0142.0TMHMM2.0143.0148.0MGV
198485uvig_2223uvig_2223_77169364.6819326.043940.22898TrueTMHMM2.0119.0169.0TMHMM2.096.0118.0TMHMM2.093.095.0GPD
967090MGV-GENOME-0379330MGV-GENOME-0379330_317225122.5378822.463490.09151TrueTMHMM2.051.0225.0TMHMM2.028.050.0TMHMM2.01.027.0MGV
2171878TemPhD_cluster_8005TemPhD_cluster_8005_1997244.0121224.097610.99159TrueTMHMM2.081.097.0TMHMM2.058.080.0TMHMM2.044.057.0TemPhD
1789301TemPhD_cluster_16210TemPhD_cluster_16210_5658121.8212421.821240.11086TrueTMHMM2.027.058.0TMHMM2.04.026.0TMHMM2.01.03.0TemPhD
2770945Station155_DCM_ALL_assembly_NODE_4760_length_17116_cov_5.143016Station155_DCM_ALL_assembly_NODE_4760_length_17116_cov_5.143016_1477121.4682411.340320.93221TrueTMHMM2.01.049.0TMHMM2.050.072.0TMHMM2.073.077.0GOV2